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Abstract 

Genotypic fitness landscapes are constructed by assessing the fitness of all possible com¬ 
binations of a given number of mutations. In the last years, several experimental fitness 
landscapes have been completely resolved. As fitness landscapes are high-dimensional, their 
characterization relies on simple measures of their structure, which can be used as statistics 
in empirical applications. Here we propose two new sets of measures that explicitly capture 
two relevant features of fitness landscapes: epistasis and constraints. The first set contains 
new measures for epistasis based on the correlation of fitness effects of mutations. They have 
a natural interpretation, capture well the interaction between mutations, can be obtained 
analytically for most landscape models and can therefore be used to discriminate between 
different models. The second set contains measures of evolutionary constraints based on 
“chains” of forced mutations along fitness-increasing paths. Some of these measures are 
non-monotonic in the amount of epistatic interactions, but have instead a maximum for 
intermediate values. We further characterize the relationships of these measures to the ones 
that were previous proposed (e.g. number of peaks, roughness/slope, fraction of non-additive 
components, etc). Finally, we show how these measures can help uncovering the amount 
and the nature of epistatic interactions in two experimental landscapes. 


1 Introduction 


Fitness landscapes have been a very successful metaphor to study evolution. Most simply, the 


idea of Sewall Wright (1932) to view evolution as a hill-climbing process proved to be appealing 
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and inspired a vast amount of theoretical work in phenotypic and molecular evolution (Orr, 2005 


de Visser and Krug 20141. Furthermore, this metaphor contributed to the scientific exchange 


with other fields, especially with computer science (Richter, 20141 and physics (Stein 19921. 


In evolutionary biology, fitness landscapes have been used to study adaptation. In the classi¬ 
cal metaphor, an evolving population is abstracted into a particle that navigates in the landscape 


(Orr, 20051. In a strong selection weak mutation regime (Gillespie 1983), the evolutionary paths 


followed by the populations are restricted to paths of increasing fitness, which can sometimes 
even be completely deterministic. In this perspective, it has been emphasized that many fun¬ 
damental features of adaptation depend on whether the landscape is smooth or rugged. Among 
other things, the ruggedness and the properties of fitness landscapes have been related to spe- 


ciation processes ( 

Gavrilets 

2004 

Ghevin et al. 

2014 

), to the benefits of sexual reproduction 

(Kondrashov and Kondrashov 

2001 |de Visser et al. 2009 

Otto 

2009 

Watson et al. 

20111, and 


more generally, to the repeatability of the adaptation process (e.g. Kauffman (19931; Colegrave 


and Buckling (20051; Chevin et al. (2010); Salverda et al. (2011)). 


Consequently, it is now clear that several aspects of evolutionary processes directly depend on 
the structure of the fitness landscapes in which the organisms are evolving. Furthermore, Wright’s 
idea of genotype-fitness landscapes moved from a metaphor to an object of experimental studies, 


as several fitness landscapes were experimentally resolved (de Visser and Krug 2014). In that 


regard, characterizing the structure of experimental and model fitness landscapes is a key step 
in our ability to decipher evolution at the finest scale. However, because fitness landscapes are 
objects of (very) high dimensionality, their characterization relies on simple scalar measures {i.e. 
statistics) that are able to capture the important features of the landscapes. In this study, we 
propose two new sets of measures for fitness landscapes that have an immediate interpretation 
in biology in terms of epistasis and evolutionary constraints. 

One of the most basic ingredients that characterize the structure of fitness landscapes is 
epistasis. Epistasis is the interaction between the effects of mutations at different loci. It is 
usually defined as the non-multiplicative part of the fitness effects of combined mutations, that 
is the non-additive part, in log-scale. In the presence of epistasis, the fitness effect of a mutation 
at a given locus depends on the genetic background and consequenty, a mutation at a given 
locus changes the distribution of fitness effects of other mutations at other loci. For the 2-loci 
2-alleles case, assuming that the genotype with the smallest fitness is labeled 00, epistasis can 
be expressed (in logscale) as the departure from additivity : e = /(II) -I- /(OO) — /(lO) — /(Ol), 
where /(*/) is the malthusian fitness of the genotype ij (e.g. Phillips] (2008|)). 


Assuming random fitness values {i.e. NK landscapes), Kauffman (1993) showed a positive 
correlation between the amount of epistatic interactions and the ruggedness of a landscape, de- 
hned as the density of peaks (genotypes with no htter neighbors). As the number of loci that 
interact together grows, the landscape is more rugged, and more local peaks end evolutionary 
paths. At the maximum number of epistatic interactions, the fitnesses of each genotype are 
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completely uncorrelated, resulting in a so-called House-of-Cards model (Kingman 1978 Kauff- 


man and Levin 

1987 

, for which several measures on paths of increasing fitness were derived 

(Kauffman and Levin 

1987 

Franke et al. 

2011 

Hegarty et al. 

2014 

Berestycki et al. 

2013 

I 


Experimental fitness landscapes make possible to test predicted properties of evolutionary 


trajectories through evolution experiments, together with sequencing (Achaz et al. 2014 1 . Cur¬ 


rently available fitness landscapes are based on mutations in a few loci, typically 4-10 (Szendro 


et al. 2013 Weinreich et al., 2013). Since the number of genotypes scales as the product of the 


number of alleles at all loci, testing all the combinations of mutations in an organism (or even in 
a protein) is beyond the reach of any reasonable future experiment. However, small landscapes 
have been resolved and analyzed. An exciting opportunity raised by the recent release of these 
experimental landscapes is to find the adequate model(s) that is (are) able to generate landscapes 
that share similar features with the observed ones. In that regard, characterizing the structure 
of small fitness landscapes is today a key step for our understanding of evolution in the presence 
of realistic interactions among mutations. 

The high dimensionality of fitness landscapes make them almost impossible to visualize (al¬ 


though some attempt were proposed, see McCandlish (2011) or Brouillet et al. (submitted)). 
As a consequence, the analyses of fitness landscapes will mostly rely on measures that capture 
important features of their structures. Several measures were proposed previously and used to 


analyze experimental fitness landscapes (reviewed in Szendro et al. (2013)). The most natural 


one that was historically used to appreciate the ruggedness of a landscape is its number of peaks 


(Weinberger 1991). Intriguingly, although ruggedness is more adequately represented by both 


types of extrema, only little attention has been payed to the number of sinks (genotypes with 
only fitter neighbors). We therefore suggest that peaks and sinks are both adequate measures 
of the landscape ruggedness. Most models generate landscapes with the same mean number of 
peaks and sinks; however, small landscapes can have a different number of peaks and sinks due 
to random sampling. Furthermore, there is no theoretical reason for a symmetry between peaks 
and sinks in real fitness landscapes. 

Other measures such as r js ratio (ratio of the roughness over additive fitness), fraction of sign 
epistasis, etc. (see detailed description in the Appendix) were also proposed to characterize the 
structure of fitness landscapes. As they all represent direct or indirect measures of epistasis, all 


these measures were shown to be pairwise correlated in experimental fitness landscapes (Szendro 


et ah, 2013). In that regard, other measures related to evolution but somehow uncorrelated 


to the amount of epistasis are also needed to investigate the nature of the interactions in the 
landscapes. 

Here, we describe two new measures that can be used to characterize the structure of fitness 
landscapes (Figure[^. The first one, 7 , is the single-step correlation of fitness effects for mutations 
between neighbor genotypes (Figure [^). It is a direct measure of epistasis, i.e. it measures how 
much the fitness effect of a mutation is affected when a genotype experiences another mutation. 
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As all correlation measures, 7 ranges from —1 to +1 and is a very natural quantity to describe the 
amount of epistasis (ruggedness) in the landscape. The second one, chains, aims at quantifying 
the amount of constrained evolution in the landscapes. In the limit of strong selection and low 
mutation rates, when a hill climbing evolutionary path reaches a genotype that has a single 
fitter neighbor, the next step of the path is essentially deterministic. In this case, only one of the 
mutations available in the landscape can improve the fitness. If several genotypes with a single 
way out are connected together they form a chain of “obligatory” mutational events that often 
ends on a peak (Figure [^), but could also end on an intermediate genotype. 



1 peak, 2 sinks 

Figure 1: Measures of fitness landscapes. We depict on the same fitness landscape (of 3 loci with 2 
alleles each) three measures. 

(a) Peaks, here in green, are genotypes with no fitter neighbors whereas sinks are genotypes with only 
fitter neighbors. 

(b) 7 is the pairwise correlation in fitness effect of mutation between neighbor genotypes. It measures 
how much another mutation in a genotype affects the focal mutation, averaged across all mutations and 
the whole landscape. Here the average correlation is good (7 « 0 . 7 ). 7->i is the correlation in fitness 
effect of mutation i between neighboring genotypes. In the example, mutations at locus 1 are almost 
independent of the genotypes (7-n ~ 0 . 9 ), whereas the effects of the mutations at locus 3 show almost 
no correlationa across genotypes (7->3 ~ 0). 

(c) In this landscape, there is a single chain tree that is composed of 4 steps (genotypes with a single 
fitter neighbor), 2 origins and a depth (the largest number of steps chained together) of 3 . 


Y -0.7 


1 chain tree 

(2 origins, 4 steps and depth 3) 


The two measures target different features of the landscapes: 7 aims at quantifying the 
amount of epistasis, independently of its nature, whereas chains discriminate between different 
types of epistasis. In the following, we will present both measures in detail, then discuss their 
relations with the existing measures and hnally quantify them on two experimental landscapes. 
We also present the mean value of these measures for several landscape models, like the House 
of Cards (HoC), the Rough Mount Fuji (RMF) and NK landscapes as well as Ising and Eggbox 


4 









models. Details on previous measures and model landscapes used here are given in the Appendix. 


2 Epistasis as correlation of fitness effects: 7 


2.1 Definition 

In this section, we will derive and discuss a new measure that is a natural description of the 
amount of epistasis in fitness landscapes. This new measure, denoted by 7 , is simply the correla¬ 
tion of fitness effects of the same mutation in single-mutant neighbors (see Figure]^ and Figure 
[^. It measures how the effect of a focal mutation is altered by another mutation at another 
locus in the background, averaged across the whole landscape. 


a) definitions 


S[j] 


Sj(g) 

g ■ 


S[ijl 



> S[i] 


b) types of epistasis 


magnitude epistasis 

1 > Y >0 

S[j] -► 8[ijl 

> A 



sign epistasis 
1 >Y>-1/3 

8[jl -► S[ij] 

A 


T 

->■ 8[i] 


reciprocal sign epistasis 
0>Y>-1 

8[jl ^ - 8[ij] 

n 



Figure 2 : (a) Notation: 7 is the correlation between the fitness effects Sj{g) and Sj(g[i]) over all genotypes 
g and mutations i,j in the landscape, (b) Types of epistasis, possible values of 7 and examples of the 
corresponding fitness graphs. 


In the following, we will define it properly in mathematical terms for the bi-allelic case. We 
denote the (log-scaled) fitness of a genotype g by f{g). We also define g[i\, the genotype g 
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where the locus i is mutated. The fitness effect of a mutation at locus j, i.e. the log-scale 
selection coefficient of the mutation, is denoted by Sj{g) = f[g[j]) — f{g)- The new measure 7 is 
then defined as the correlation between two fitness differences Sj{g) and measured from 

genotypes that are one mutation away, as illustrated by Figure [^. 

Noting that the average of Sj{g) across all genotypes and mutations in the landscape is 0 , we 
define 7 as: 


7 = Cor[s(5),s(gi)] ( 1 ) 

^ Cov[s(g),s(gi)] 

Var[s(g)] 

{L-l)J2gT,jisj{gW 

where gi indicates a generic genotype that differs from g by a single mutation. For multiallelic 
landscapes, the same definition 7 = Cor[s(g), s(gi)] can be immediately generalized to any 
number of alleles. 

Even though 7 is originally defined in term of fitness effects of the mutations, it can be easily 
recomputed by only using the fitness values themselves. If we denote pd = Cor[/(g),/(g^)] the 
correlation between fitness of genotypes apart from distance d, it is possible to rewrite 7 by the 
simple formula (see proof in the Appendix): 


Pi - P 2 


1 - Pi 


( 2 ) 


Besides its general interest, this formulation allows us to measure 7 in the presence of missing 
data. Indeed, fitness correlation functions do not need fitness data for all combinations of some 
set of mutations in order to be estimated. 


2.2 Interpretation 


To make clear that the above measure is a metric of epistasis, we rewrite the above equation as 


E[(s(g)-s(gi))^] 

2 E[s2] 

E[e^] 

2E[s2] 

EgE* J2jMsj{g) - Sj{g[i])f 

2(E-l)EgE,(5.(5))^ 


( 3 ) 


When there is no epistasis, the fitness effects do not depend on the background and 7 = 1 , 
i.e. perfect correlation between fitness effects. The deviation of 7 from 1 is proportional to the 
square of Cij = f{g[ij]) — f{g[i]) — fidlj]) + /(ff)) which is a standard measure of the amount of 
epistatic effect (for 2 -alleles 2 -loci), normalized by the average squared fitness effect. Thanks to 
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its normalization, this measure of epistasis is not affected by the scale and the absolute level of 
fitness, but only by relative differences in fitness. Shifting fitnesses by a multiplicative or additive 
factor does not change this measure. 

The measure 7 is defined as a correlation, therefore it is bounded by — 1 < 7 < 1, with 7 = 1 
in the case of no epistasis. The value of 7 is related to the prevalent type of epistatic interactions 
(see proof in the Appendix), which are summarized in Figure]^: 

• magnitude epistasis refers to pairwise interactions that do not change the signs of fitness 
effects. Magnitude epistasis would still result in a positive correlation between fitness 
effects, therefore 7 would still be positive even if smaller than 1 : 1 > 7 > 0 ; 

• sign epistasis refers to pairwise interactions where the fitness effects of one mutation change 
sign after the other mutation. Sign epistasis would contribute with terms of both signs to 
the correlation, therefore resulting in values centered around 0: 1 > 7 > —1/3; 

• finally, reciprocal sign epistasis refers to pairwise interactions where both fitness effects 
change sign. Reciprocal sign epistasis would imply a negative correlation between fitness 
effects, and therefore a negative value of 7 : 0 > 7 > — 1. 

The deviation of the mean value of 7 from 1 for simple landscape models measures epistasis 
as a function of the parameters of the models (see the Appendix for details on the derivations; 
the reader unfamiliar with the models will also find a brief presentation there). For example, in 
NK landscapes the epistasis grows with the parameter K describing the number of loci involved 
in each interaction and in fact we have the approximate equation 

E[7]=.1-^. (4) 

For the HoC model, i.e. a maximally uncorrelated landscape, we have K = L — I and therefore 

E[ 7 ] 0, (5) 


i.e. this model shows strong random epistasis. 

For RMF models, which are combinations of an additive landscape and a completely uncor¬ 
related one, the correlation of fitness effects is 


E[7] ~ 1 - 


2a 


2 

HoC 


Ma + + 2cr|joC 


( 6 ) 


where fia and da are the mean and variance of the additive fitness effects and a'^^^ is the 
variance of the uncorrelated HoC component. Therefore, in this case, the measure of epistasis is 
proportional to the variance contribution of the uncorrelated component. 
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2.3 Epistasis for specific mutations 

The correlation of fitness effects is also a useful measure of the interaction between specific 
mutations. Some simple generalizations of the 7 measure are: 


• which describes the epistatic effect of a mutation in locus i on other loci: 



(7) 


• 7 ->.j, which describes the epistatic effects of other mutations on locus j-. 



( 8 ) 


• which is a matrix that describes the epistatic effect of locus i on locus j: 



(9) 


These measures can also be generalized easily to multiallelic landscapes by considering pairs of 
mutations at different loci. 

The measure 7 ->.j and especially Ji-^j are useful for exploratory and illustrative purposes, 
since they summarize the interactions between mutations in a clear and compact way, as it can 
be seen in Figure 

It is also possible to use the more direct measure E[ey as an alternative to The 

difference lies in the normalization: = 1 — E[ef^]/2E[s|], therefore treats both large 

and small mutations in the same way while E[e?j] is larger for large mutations. The choice of the 
most appropriate measure depends on the question, i.e. if the focus is on the interactions across 
all mutations, or only the largest ones. 

2.4 Decay of the correlation with distance 

The 7 measure provides information on the amount of epistasis but cannot discriminate be¬ 
tween different types or models of fitness landscapes, as it occurs for any single measure of 
the amount/strength of epistatic interactions. In fact, there are many landscapes with widely 
different structure but with the same 7 . For example, a HoC model realization would have 7 = 0 
as would a landscape composed by an equal mixture of additive and reciprocal sign epistatic 
interactions (like in an EMF model). 

However, a natural and interesting extension of this measure is given by the full decay of the 
correlation of fitness effects with distance d, which correspond to the cumulative epistatic effect 
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of d mutations and can be defined as: 


7 d = Coiis{g),s{gd)] 

Sg Sii X/i2>ii ■ ■ ■ '^id>id-l Sj/il ,*2 ■■■id 


( 10 ) 


d]) 


/L-l\ 


where 71 = 7 . As with 7 , "fd can be expressed in terms of the fitness correlation functions at 
distance d, pd- 

Pd Pd-\-l /I 1 \ 

Id = —j- (11) 

1 — Pi 

The decay of 7 ^ with the Hamming distance d is an interesting object of study in itself , 
since it describes how the epistatic effects of different mutations interact with each other and 
their cumulative effect. The mean of 7 ^ can be computed analytically in most models of fitness 
landscapes - see the first section of the Appendix - and it brings extra information on the structure 
of the landscape. Different models have a different behaviour (Figure]^: RMF and HoC models 
show an abrupt fall already at d = 1 and then a flat profile, while NK models have a gradual, 
approximately exponential decay with rate K/{L — 1) (Supplementary Figure 1). Models based 
on The Ising model (based on pairwise reciprocal sign epistasis) decays linearly until —1, while 
the eggbox (maximally epistatic, anticorrelated) oscillates between —1 and 1 . 

Note that the fact that 7 ^ < 1 together with equation (11) implies that there is a general 
bound on the decay of the fitness correlation functions with distance. In detail, the bound is 

\pd+i - Pd\ < I - Pi (12) 

i.e., the decay of fitness correlation functions is bounded by the 1 -step correlation function. 


2.5 Correlation in signs (7*) 

In many experimental situations, fitness is not clearly measurable on an absolute scale, but it is 
possible to rank the genotypes in order of increasing fitness, or at least to state if a mutation is 
deleterious or beneficial. 

The fitness landscape can be then represented as an acyclic oriented graph, i.e. an oriented 
network where links between genotypes represent single, fitness-increasing mutations. Hereafter, 
we will refer to this graph as the fitness graph. As an example, the fitness graphs corresponding 
to different types of epistasis for 2 loci are illustrated in Figure [^. 

In this context, it is still possible to measure epistasis via the same method by employing a 
modified measure 7 * which uses just the sign of the fitness effects, instead of their value. We 
define s*{g) as the sign of Sj{g). A more robust variant would be 

{ +1 for Sj{g) > e 

0 for — e < Sj{g) < e (13) 

-1 for Sj{g) < -e 
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where e is a tolerance parameter (possibly depending on the genotype, and larger than the 
experimental errors). The measure 7* is defined as before: 


7* = Coi[s*{g),s*{gi)] 


J2g Ej/i s*i9) ■ s*{gi{\) 


( 14 ) 


If the landscape has no neutral mutations, we can show that this measure is related to 
other commonly employed measures for fitness graphs. Consider all possible pairwise mutational 
motifs in the fitness graph and classify the type of epistasis in each motif as magnitude epistasis, 
sign epistasis and reciprocal sign epistasis (see Figure]^). We denote the fraction of motifs in 
each class by (prm <Ps and (j)rs respectively. We have the relation (see proof in the Appendix) 


"f* = I - 4>s - ‘24>rs 


( 15 ) 


What is even more interesting is that both in models and in real landscapes, the results of 
7 and 7* are often numerically close and highly correlated (see below). The only exception is 
represented by landscapes with weak epistatic interactions dominated by magnitude epistasis, 
where 7* = 1 . This suggests that 7* could be used in place of 7 for landscapes where only fitness 
ranks are known. These measures represent therefore a bridge between fitness graph-based 
measures and quantitative measures based on absolute fitness. 

3 Constraints in mutation order: chains 

Many fitness landscape measures are correlated with the amount and strength of epistasis in the 
landscape. However, landscapes with similar epistasis could have widely different structure of 
epistatic interactions, origin and evolutionary properties. One of these properties is the amount 
of constraints on the possible evolutionary paths. 

A natural measure of evolutionary constraints is given by the abundance and structure of 
maximally constrained paths. Our aim is to characterize these paths in a simple and effective 
way. For that, we focus on genotypes with only a single beneficial mutation. All fitness-increasing 
paths that pass through such genotype will share this mutation. Some landscape have “chains” 
of consecutive mutations with this properties (see Figure [^). We now examine the abundance 
and size of these chains as measures of evolutionary constraints. 

We define a chain step as a mutation g ^ g' that is the only possible fitness-increasing 
mutation from the genotype g. Chain steps can occur one after another, forming a linear path 
of obligatory mutational steps, that we call a chain. Several chain steps can lead to the same 
genotype, but a genotype can have at most one outgoing chain step. For this reason, chains 
can form tree-like structures, that we call a chain tree. A chain tree is formally the set of all 
genotypes which are forced to evolve along obligatory paths up to a common final genotype, i.e. 
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a maximal groups of connected chain steps. This definition implicitly assumes a strong selection 


regime (Gillespie 1983), as there is no fixation for mutations of negative fitness effect. In an 
additive landscape, there is a single chain tree containing only those genotypes one mutation 
away from the (single) peak. Note that other landscapes can contain more than one chain tree 
(or none). We compute also the number of origins, that are all genotype that are initial points of 
a chain (that obviously excludes intermediate steps). And finally, we compute also the maximal 
depth of all chains in the landscape, that is, the maximum number of consecutive steps. In an 
additive landscape, the depth of the only chain tree is 1. 


3.1 Out-degree distribution of fitness graphs 

Chains are a natural choice for a measure of evolutionary constraints in the framework of fitness 
graphs. In particular, chain steps are strongly related to the out-degree distribution of the fitness 
graph corresponding to the landscape. 

In fact, there is a one-to-one correspondence between chain steps and genotypes with a single 
fitness-increasing mutation. The number of fitness-increasing mutations from a given genotype 
is the out-degree of the genotype in the fitness graph. Therefore, chain steps are simply nodes 
of out-degree I. 

One of the most well-studied measures of fitness landscapes is the number of peaks. Since 
there are no fitness-increasing mutations from a peak, peaks are simply nodes with out-degree 0. 
Therefore, the number of peaks is actually the first bin of the out-degree distribution. Similarly, 
the number of sinks correspond to the number of genotypes with out-degree L (the last bin of 
the out-degree distribution). The number of chain steps is the second bin of the out-degree 
distribution of the fitness graph, and is therefore a natural step further in the characterization 
of the distribution of out-degree. 

For the small empirical landscapes currently available (L = 4 —10), chains contain most of the 
local information about evolutionary constraints. For larger landscapes, nodes with out-degree 
2, 3, or more would also be relevant to assess the amount of constraints. For these landscapes, 
the full out-degree distribution could be an interesting object of study. 

In Figure]^ are reported the out-degree distributions for four types of epistatic interactions 
(none, random, pairwise incompatibilities and compensatory at higher order) that correspond 
to four different theoretical landscape models (Additive, HoC, Ising and Eggbox). Results show 
that the out-degree distribution usually differs for different types of interactions. However, it is 
noteworthy to mention that Ising and HoC models show the same average out-degree distributions 
even though the nature of their epistatic interactions are fundamentally different as well as their 
overall structure (see Figureand details of the models in the Appendix). 

Interestingly, only the number of chain steps is contained in the out-degree distribution. 
Indeed neither chain depth nor the number of chain origins can be computed from just the 
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One realization 


Additive modei 
{ha=1. c!a=0.1) 


House of Cards model 
(ohoC=0-'I) 


Ising model 

(Hi=1,a|=0.1) 


Eggbox model 
(he=1,0^=0.1) 



0 1 2 3 4 5 



0 1 2 3 4 5 



0 1 2 3 4 5 



0 1 2 3 4 5 
degree 


Average of 
1,000 realizations 





0 1 2 3 4 5 



0 1 2 3 4 5 


degree 


Figure 4: Out-degree distributions for model landscapes. The distributions of out-degree (number 
of fitter neighbors for all loci of the landscape) for four different types of interactions. On the left panel, 
we report results for a single realization whereas, the right panel reports averages over 1,000 replicates. 


out-degree distribution. Both values really depend on the structure of the chain tree itself. A 
star-like tree, as we expect in additive landscapes, will have several origins and only a depth of 1, 
whereas a deep chain with a single origin is expected under an IMF model (Ising Mount Fuji is a 
mixture of additive with Ising pairwise interactions, see details of the models in the Appendix). 
In this last case, the chain corresponds to the subsequent replacements of alleles at neighboring 
loci starting from one edge up to the other. Chain depth and the number of origins somehow 
relate to correlations in the outdegree of nodes in the fitness increasing paths. This stresses the 
interest of studying both the out-degree distribution and the chain trees in fitness landscapes. 
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3.2 Connection between chains and the amount of epistasis 

In an additive landscape, all the L genotypes around the peak are origins of a chain tree that 
ends at the peak. This starlike chain tree of depth 1 around the peak is the only chain tree in 
these landscapes. In contrast, in a HoC model, there are many small and slightly deeper chain 
trees; their number is of the order of 2^/(L +1) (see proof in the Appendix). Therefore, it would 
be tempting to conclude that the number of chain trees (as well as other measures of chains) are 
correlated to the amount of epistasis. However, this is not the case. 

For example, in an RMF model with equal additive contribution, it is possible to derive exact 
theoretical results for the mean of some chain tree measures -most relevant, the number of chain 
steps. Equations are presented in appendix B. The results are illustrated in Figurewhere the 
mean of several chains measures are reported as a fraction of additive component in the RMF 
model. Because the HoC component is fixed (fixed cfhoc), the higher fia, the more additive is 
the model (/ia is the mean additive effect of mutations) ; consequently, when fj,a —t oo, the model 
becomes additive whereas it converges to an HoC model when /i^ —>■ 0. 

Furthermore, even though chain measures for other models (e.g. IMF and EMF) cannot be 
computed analytically, they can be retrieved from simulations (Figure]^. Interestingly, several of 
the chain measures appear to be non-monotonic with respect to epistasis in the RMF model and 
even more so in the IMF model. Both the number of steps and the chain depth tend to have a 
maximum for intermediate values of epistasis, when the contributions of the additive component 
and of the interactions are comparable (Figures and [^. Interestingly the type of interaction 
plays a important role in the structure of the chains, as only short chains are observed for the 
RMF and EMF models whereas long chains are observed for the IMF model. Furthermore, the 
size of the landscape also changes the dependence between epistasis and chain measures (depth 
and abundance) (Figure [^. 

3.3 Generalized chains 

We can generalize the concept of chain by including cases where there is more than one fitness- 
increasing mutation from a given genotype, but all paths starting with these mutations eventually 
lead to the same genotype. 

We can dehne a generalized chain step as a pair of genotypes g ^ g' such that all fitness- 
increasing paths from g pass through g'. Note that in this case, a step can encompass one or more 
genotype(s) between g and g', whereas there is none in the original definition. A generalized 
chain can then be defined as a sequence of generalized chain steps. To compute the depth, we 
count all genotypes chains go through. 

Generalized chains can also be interpreted in terms of constraints on evolution, but they are 
intrinsically non-local so they are not related to the out-degree distribution of the fitness graph. 
Instead, the total size (number of genotypes in the chain steps) of the generalized subchain ending 
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Figure 5: Chain measures (number of steps, number of chains, number of origins) as a function of the 
additive fitness effect fXa for a RMF landscape with = 0, crjjoC = 1- Lines represent the analytical 
mean values, while dots represent the average over 10'* simulations. 
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Figure 6 : Epistasis and chain measures for different landscape models with L=5. The landscapes are 
build from an additive component with mean and variance Ga = /Ta/lO and an epistatic component. 
The epistatic component is: (RMF) an HoC model with = 1; (IMF) an Ising model with mean 

incompatibility cost /Xc = 1 and variance nf = 0.1; (EMF) an eggbox model with mean fitness effect 
fiE = 1 and variance g% = 0.1. We plot epistasis as measured by 7 and r/s, and chains as the number 
of chain steps and the maximum depth. 


at a genotype g' corresponds to the size of the exclusive basin of attraction of the genotype, the 
set of genotypes that will inevitably evolve through g'. Any path of increasing fitness starting 
at these genotypes will go through the genotype g'. 

This is particularly interesting for peaks, for which it captures their exclusive basins of at¬ 
traction. Non exclusive basins of attraction are usually defined as genotypes from which the 


peak can be found (Kauffman 19931 and their size is commonly interpreted as a measure of the 
evolvability in the landscape. The exclusive basin of attraction of a peak measures how many 
genotypes have an obligatory end point, i.e. how many genotypes are forced to evolve towards 
a single final state. 
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4 Relations between measures 


In this section we discuss the relations existing between the newly proposed measures and the 
existing ones. 

We expect that the new measures of epistasis 7 and 7 * would be correlated to other measures 
of epistasis. In fact, we already discussed how they are related to some of the existing measures. 
In particular, (i) for pairs of loci, 7 is directly related to the common definition of epistasis e as 
1 — 7 oc e^: in fact, for the whole landscape we have 7 = 1 — E[e^]/2E[s^], while for a pair of 
mutations = 1 — E[ey/ 2 E[s^]; (ii) for the whole landscape, 7 can be rewritten as a function 
of the fitness correlation functions pd as 7 ^ = {pd — pd+i)/(I — pi); (iii) 7 * is directly related to 
the number of square motifs with sign and reciprocal sign epistasis as 7 * = I — (/is ~ ‘2^4‘rs ■ 

Furthermore, it is possible to show that 7 is a function of the Fourier spectrum of the land¬ 
scape, provided that the standard orthonormal basis is used for the Fourier series. If we denote 
by Wj the normalized weight of the coefficients of order J in the Fourier spectrum, i.e. the 
sum of the squared coefficients of all J-loci interactions normalized by the sum of all squared 
coefficients, the relation is 


7d = 1 - 




Wj 


Ej=i JWj 

(see proof in the Appendix). Our measure of epistasis is therefore 


7 = 1-2 


Ed=i J{L-l)Wj 


(16) 


(17) 


which resembles another measure of epistasis Ej =2 ^-f/Ej=i (Szendro et al. 2013), show¬ 
ing again the close relation with previous measures of epistasis. The main difference is the weight 
of higher-order interactions: the contribution of J-loci interactions to 7 grows like for large 
J, so that the effect of interactions is stronger if they involve more loci. 

We also discussed how the number of chain steps, the number of peaks and sinks correspond 
to three different components of the out-degree distribution of the fitness graph. On the other 
hand, we also suggested that there is no direct relation between the amount of epistasis and the 
number of chains. 

To evaluate in a more systematic way the relations between these and other measures, we 


perform a correlation analysis similar to Szendro et al. (2013) but using models instead of exper¬ 
imental landscapes. We select the number of peaks, the number of sinks, the roughness/slope 
ratio (ratio between epistatic “noise” and additive component, see Appendix), 7 and 7 * as mea¬ 
sures of epistasis, plus the number of chain steps and the maximum chain depth. We compute the 
Spearman correlation coefficients of all pairs of measures in the RMF landscape model varying 
the model parameters (in particular, the ruggedness). 
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peaks 

sinks 

rjs 

7 

* 

7 

steps 

depth 

RMF model 

peaks 

1 

- 

- 

- 

- 

- 

- 

sinks 

0.53 

1 

- 

- 

- 

- 

- 

rjs 

0.60 

0.62 

1 

- 

- 

- 

- 

1 

0.73 

0.71 

0.76 

1 

- 

- 

- 

* 

7 

0.77 

0.73 

0.71 

0.88 

1 

- 

- 

steps 

0.09 

0.01 

0.001 

5•10-4 

4-10-5 

1 

- 

depth 

0.007 

0.11 

0.08 

0.07 

0.08 

0.34 

1 

IMF model 

peaks 

1 

- 

- 

- 

- 

- 

- 

sinks 

0.69 

1 

- 

- 

- 

- 

- 

rjs 

0.72 

0.45 

1 

- 

- 

- 

- 

1 

0.73 

0.46 

0.99 

1 

- 

- 

- 

* 

7 

0.72 

0.73 

0.72 

0.73 

1 

- 

- 

steps 

0.25 

0.01 

0.09 

0.09 

1 

o 

1 

- 

depth 

4•10-5 

0.08 

0.02 

0.02 

0.20 

0.68 

1 

EMF model 

peaks 

1 

- 

- 

- 

- 

- 

- 

sinks 

1.00 

1 

- 

- 

- 

- 

- 

rjs 

0.74 

0.74 

1 

- 

- 

- 

- 

7 

0.74 

0.74 

0.97 

1 

- 

- 

- 

* 

7 

0.99 

0.99 

0.74 

0.74 

1 

- 

- 

steps 

0.97 

0.96 

0.71 

0.71 

0.95 

1 

- 

depth 

0.93 

0.93 

0.67 

0.68 

0.91 

0.96 

1 


Table 1: Spearman correlation of pairs of measures across 10"^ realizations of the RMF, IMF and 
EMF models with L = 5, ohoC = 1 (for RMF), pj = 1 and ai = 0.2 (for IMF), fiE = 1 and as = 0.2 
(for EMF), (Ta = Pa/10 and pa log-uniformly distributed in [0.01,10]. These numbers correspond to the 
scale used in Figure 
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The pairwise correlations (Tablej^ confirm the intuition that the measures related to epistasis 
are all strongly correlated, the strongest correlation being between 7 and 7 * as expected. For 
RMF and IMF, the correlation between these measures and the chain measures, on the other 
hand, is especially low. This is not the case for the EMF model. This shows that the chain 
measures quantify some landscape properties that is not simply correlated to the amount of 
epistasis. 

Interestingly, the behavior of chains with epistasis is also apparent when we compare them 
across different models of epistatic interactions. In Figure we show what happens in fitness 
landscapes with an additive contribution plus different epistatic models (HoC, Ising and Eggbox). 
The comparison between measures of epistasis and chain shows clearly that there is no simple 
relation between them, as all three models of interactions have different behavior. Chain measures 
appear to depend strongly on the nature of the epistatic interactions, and therefore could provide 
useful, independent information. The strong correlations between chain measures and epistasis 
observed for EMF relates to the observation that the chain measures are almost a two-steps 
constant function for EMF with a variable component of additive model. 


5 Measures on two experimental landscapes 


As an example of application of the new measures 7 and chains, we use them to analyze two 
complete experimental landscapes of size L = 5. These landscapes are illustrated in Figure 

The first landscape is the landscape of antibiotic (cefotaxim) resistance of /3-lactamase muta¬ 
tions in an Escherichia coli plasmid from Weinreich et al. ( |2006[ ) (Figurej^left). The 5 mutations 
have a very strong effect that together give a 4 x 10'^ increase in antibiotic resistance and were 
therefore selected together. Given the huge selective advantage of the combined mutations, this 
landscape is single-peaked, where the peak corresponds to the five-point mutant. It also has a 
single sink, that interestingly does not correspond to the wild type. 

The second is one of the four L = 5 complete sublandscape (csl) ([Franke et al. 2011) of 


a larger landscape (L = 8 ) of deleterious mutations in Aspergillus niger from de Visser et al. 
( 1997[ ) (Figure [fright). This landscape is a combination of unrelated deleterious mutations where 
epistatic interactions were not filtered by natural selection. This landscape has 4 peaks and 2 
sinks; in fact, at present it is one of the most rugged among the completely resolved landscapes. 

As the landscapes were derived in completely different settings (co-selected beneficial for /3- 
lactamase and random deleterious for Aspergillus), we might not be surprised to find that these 
landscapes exhibit very different structures. Indeed theoretical arguments support the intuition 
that landscapes of co-selected mutations differ radically from landscape of random mutation 


(Draghi and Plotkin 2013 Greene and Crona 2014 Blanquart et ah, 2014). The difference 


in ruggedness between /3-lactamase and Aspergillus landscapes is conhrmed by the values of 7 
(0.85 vs 0.33) and r/s (0.43 vs 0.89). 
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Weinreich et al., 2006 


deVisseretal., 1997 






Figure 7: Values of several measures applied to two experimental landscapes, a) Illustrations of the 


landscapes using Magellan (Brouillet et al. submitted I. b) (left) Interactions between pairs of mutations 
: blue = no interaction, white = strong random interaction, red = strong interaction in sign; (right) 
Decay of 7d with Hamming distance, c) Chain steps in the landscape, d) Measures for the landscape. 
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To further explore the landscapes, we compute the matrices to illustrate and summarize 
the interactions between mutations (Figure [^). In the /3-lactamase landscape, there are some 
clear interactions between mutations (between the 2nd or the 4th and the 1st mutations, or 
between the 5th and the 4th) but none of these interactions is characterized by strong sign 
epistasis (no red cell). On the other hand, the Aspergillus landscape contains several examples 
of interactions dominated by strong sign epistasis (for example, between the 2nd or the 4th and 
the 1st or the 5th mutations). 

Similar conclusions come from the analysis of the decay of 7 with distance. The decay in the 
/3-lactamase landscape is immediate but decays slowly after the first mutation (Figure]^ left), 
resembling the behavior of RMF models. 

An interesting example of the power of 7 ^ is represented by the Aspergillus landscape (Figure 

right). This landscape shows a non-monotonic decay, with correlation 7 ^; bouncing up and 
down. This indicates a clear compensatory structure of reciprocal sign epistasis, which is not 
only due to pairwise compensation, but extends to distance 4, i.e. to the whole landscape. 
In fact, the behavior of 7 ^ suggests a mixture of an RMF landscape and an extreme case of 
compensatory interactions, like the Eggbox model. This surprising result does not come out in a 
straightforward way while looking at other measures, even when looking at the Fourier spectrum 
(Neidhart et al., 20131. Indeed, although the coefficient of the highest order in the Fourier 
decomposition measures the amount of eggbox, it compares to coefficients of smaller orders that 
have a complex intermingling when epistasis is not purely reciprocally signed at all orders. These 
coefficients contribute as well to the behaviour of 7 ^. 

Finally, for both landscapes, the number of steps and the maximum depth of chains are higher 
than expected theoretically. For the /3-lactamase landscape, the strength of epistasis suggests 
that this model should be close enough to an additive model that there should be a single chain 
of depth 1 and five steps ending at the peak, while we observe two chains with nine steps and 
their maximum depth is 3. This suggests that epistasis in this landscape is not random, but 
is structured in a way that constrains evolution and that is not captured by any of the models 
presented here. On the other hand, the number of steps and the depth of the single chain in the 
Aspergillus landscape are higher but not too different from the values expected for a RMF or 
EMF landscape, which is consistent with the above suggestion that this landscape resembles a 
mixture of RMF and EMF. 


6 Discussion 


In this work, we presented two new sets of landscape measures which have a simple interpretation 
and cover a range of potential applications. These measures among others have been implemented 


in MAGELLAN, a graphical tool to explore small htness landscapes (Brouillet et al. submitted). 


The first application is the measure of epistasis in a comparable way across landscapes. The 
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correlation of fitness effects 7 is a natural measure for this. This measure can be used also 
for pairs of mutations, to explore the strength of epistatic interactions between mutations in a 
compact way. 

In terms of 7 , there is a natural scale for the strength of epistatic interactions, from purely 
additive interactions (7 = 1 ), through strong random interactions (7 = 0 ) to a fully compensatory 
landscape (7 = —1). Interactions in landscapes with 7 < 0 are dominated by strong sign and 
reciprocal sign epistasis between most loci, therefore we expect such landscapes to be rare and 
possible only for some sets of mutations, as selection tends to favor mutations with positive 
interactions. In fact, the two experimental fitness landscapes analyzed have positive values of 7 . 
Yet, the amount/strength of epistasis in the landscape by de Visser et al. is remarkably high: 
7 = 0.33 means that the fitness effect of a mutation in a given genotype, is a poor predictor 
of the fitness effect of the same mutation in a neighbor genotype that only differs by a single 
mutation. 

Correlations of fitness effects are not only useful to quantify epistasis. Their decay 7 ^ contains 
information on the nature of epistatic interactions and can reveal interesting signals. A clear 
example of that is the Aspergillus landscape studied here. The correlations 7 ^ for this landscape 
show an oscillatory behaviour instead of the expected decay for random epistasis {i.e. HoC like) 
or for incompatibilities (i.e. Ising like), pointing towards a strong contribution of “eggbox-like” 
epistasis (reciprocal sign epistasis across multiple mutations). While the presence of pairwise 
reciprocal sign epistasis is not strange - it is actually quite common in compensatory interactions 
- the fact that reciprocal sign epistasis involves the whole landscape is quite surprising. In other 
words, starting from a first mutation chosen to be deleterious, it is not unreasonable that the 
second mutation could have a compensatory effect, but the mechanism behind the deleterious 
effect of the third mutation and the compensatory effect of the fourth mutation is obscure. It 
relate to complex pathways of interactions at the molecular level. 

Many measures can only be computed if one has fitnesses for all combinations of the set of 
mutations (or subsets of). For example, the number of peaks lose meaning in a landscape with 
missing data, since the definition of a fitness maximum requires the knowledge of the fitness of 
all its neighbors. Since the fitness correlation functions pd can be computed even with missing 


data, the correlation of fitness effects can be estimated from equation (111 even for very sparse 
landscapes. The sparseness of the landscape could increase the error on the estimate, however 
this effect could be compensated by the larger size of the landscape. Landscapes containing a 
larger number of mutations would be also more representative of real gene or protein landscapes. 

For some landscapes, only fitness ranks or the beneficial/deleterious nature of the fitness 
effects can be experimentally determined. Our measure 7 * is appropriate for these landscapes. 
While 7 depends not only on positive and negative epistasis, but it is sensitive to its strength, 
7 * is based essentially on the fitness graph and therefore depends only on the sign of epistasis. 
7 and 7 * are strongly correlated across fitness landscape models. Thus a mismatch between 7 
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and 7 * in real landscapes could point to some peculiar nature of epistatic interactions. 

Finally, the 7 and 7 ^ measures could also be useful to estimate parameters of theoretical land¬ 
scape models from empirical data, thanks to the availability of approximate analytical formulae 
for these quantities. For example, assuming that the underlying model of a landscape is the 
NK model, the measures K = {L — 1)(1 — 7 ) is an approximately unbiased Method-of-Moments 
estimator of the parameter K, i.e. E[7F] ~ K (see eq. |^. A similar approach can be used for 
the parameters of other landscape models. The potential of these measures for model inference 
and goodness-of-fit tests is yet to be studied. 

The other novel measures that we proposed are the number of steps and depth of chains, 
i.e. mutations that are obligatory under the strong selection regime in the landscape. These 
measures have an immediate evolutionary interpretation, in terms of evolvability and constraints, 
yet they show peculiar properties. The most relevant one is that they are often non-monotonic 
in epistasis, as we have shown analytically. The lack of correlation between epistasis and chains 
shows that these new statistics can be used to obtain independent information about nature of 
the interaction, instead of their strength. 

Interestingly, the number of chain steps, the chain depth (Figure]^ and the total number of 
accessible paths (Szendro et al. 20131 seem to be peaked at intermediate values of epistasis . All 
these measures are related both to evolvability - deep chains show that htness-increasing paths 
are open - and to constraints - chains represent obligatory paths in evolution. This suggests an 
evolutionary interpretation of these peaks in terms of the tradeoff between evolvability (higher 
at low epistasis) and constraints (stronger with high epistasis). Previous measures that were 
shown numerically and by some analytical approximations to be non-monotonic include the 
total number of accessible paths (Szendro et al. 2013) and the number of exceedances, i.e. the 


number of available htness-increasing mutations after an evolutionary step ( Neidhart et al.[ 2014), 
which are also related to evolvability and constraints. It is worth mentioning that chains and 
exceedance are both related to the out-degree distribution after one step of increasing htness. 
This suggests that beyond the out-degree distribution, it is perhaps worth characterizing the 
sequences of out-degrees, e.g. the out-degree distribution along evolutionary paths. 

Real landscapes tend to have longer chains than expected according to theoretical landscape 
models with random epistasis. This is the case for both experimental landscapes studied here. 
This is especially apparent and interesting in the landscape by Weinreich et ah, since it implies 
some highly non-random structure of epistatic interactions for this set of mutations. This re¬ 
sult has been found independently by randomization tests (Weinreich, personal communication) 
and cannot be seen in complex measures of epistasis like the Fourier spectrum or the decay of 
correlations 7 ^, however our chain measures were able to capture this signal. 

The notion of obligatory steps can be easily widen using the above definition of generalized 
chains. However, it is worth mentioning that the generalized chains say little about the local con¬ 
straints in the fitness landscape. Quite on the contrary, they give information on exclusive basins 
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of attraction for peaks. Therefore, typically, sizes of exclusive basins of attraction (generalized 
chains) are very informative about the convergence in state (similar endpoints regardless of the 
path). The larger the exclusive basins, the less uncertainty remains on the state, given a starting 
genotype. Quite on the contrary, strict chains report local information of the landscape. They 
measure how path are constrained but do not necessarily predict what will be the final state. An 
interesting perspective would be to redefine the chains on the mutations themselves and not on 
the genotypes (e.g. mutation at locus i is always followed by a mutation at locus j, independently 
of the genotype). Quite clearly, strict chains are informative about the convergence in path in 
the landscape (similar path regardless of the endpoints) rather than the convergence in state. 

Chains are natural measures both from the evolutionary point of view and from the mathe¬ 
matical point of view, since they are related to the out-degree distribution of the fitness graph. 
For the small complete landscapes currently available, the number of peaks, chains and sinks 
summarize most of the information present in the out-degree distribution. However, for larger 
landscapes or for incomplete ones, other components of the out-degree distribution (or other 
properties, like its variance) could be useful as measures for a finer characterization of fitness 
landscapes. 

We are still far from predicting evolution on real landscapes based on their measures, partly 
because of the incomplete knowledge of the structure of real landscapes, and partly because of the 
lack of measures with a natural evolutionary interpretation. In the future, we expect to witness 
a strong increase in the number of published empirical landscapes that will be experimentally 
resolved. The measures that we propose here will therefore find applications in the understanding 
and classification of these landscapes, as well as in studies of model landscapes. The correlation 
of fitness effects is a natural measure of epistasis that is comparable across landscapes, while the 
decay of correlations with mutation distance and the new chain measures will be useful tools to 
discriminate and classify these landscapes. Chains also highlight the interplay of constraints and 
evolvability that influence evolution on complex landscapes. 
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A Appendices 

A.l Common landscape measures 


In this section we present some common landscape measures that can be applied as statistics for 
experimental landscape data. Notation: a genotype 5 is a sequence of alleles g = (> 11 ^ 2^3 ... Al) 
of length L. For biallelic landscapes, Ai G {0,1} and Si = 2Ai — 1. 

Some of the most common measures for fitness landscapes f(g) are: 


number of peaks (Weinberger 19911: it is the number of genotypes such that all their 


neighbours have lower fitness, i.e. the number of local fitness maxima. 


r/s (roughness/slope) ratio (Aita et al., 2001 [ ): the landscape is fitted to a linear model 
(a linear combination of A^s plus a constant) by least squares. The slope s is the average 
modulus of the coefficients of the A^s. The roughness r is the quadratic mean of the 
residuals of the regression. The measure of epistasis is their ratio r/s. 


fraction of epistatic interactions (Weinreich et al. 2005 Poelwijk et al. 2007): the fraction 


of all pairs of mutations from all possible genotypes that show magnitude, sign or reciprocal 
sign epistasis. 


number of accessible paths (Weinreich et al. 2006): assume that the absolute fitness maxi¬ 


mum corresponds to the genotype g = (111...1). Count the number of paths of mutations 
0 —>■ 1 starting from g = ( 000 ... 0 ) to ( 111 ... 1 ) such that fitness increases after each mu¬ 
tation. This is the number of direct accessible paths to the maximum from its antipodal 
genotype. 


Fourier expansion and spectrum (Stadler, 1996 Weinreich et al., 2013 Neidhart et al. 


2013): The coefficients of the Fourier expansion are uniquely defined in terms of the 


Fourier decomposition 


f(ff) = fo 


E ' 

J=1 {ix...rj} 


2^/2 


... Si 


(18) 


where {ii... ij} are ordered sets. The Fourier spectrum is defined by the sum of squared 
coefficients for interactions of J loci: Bj = ij} Epistasis is usually measured 

t’y J2.j>2 


More details can be found in the review by Szendro et al. (2013). 


A.2 Models of fitness landscapes 

In this section we briefly illustrate some common models of fitness landscapes that will be used 
in this study. Most of them are illustrated in Figure Please note that we only considered 
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here models of L biallelic loci. A mathematical formulation of these models is given in the next 
section. 

A.2.1 The Additive model (a.k.a. multiplicative model) 

This is a model for non-interacting mutations with independent fitness effects. The fitness is 
simply the product of the fitness contributions of each locus: fitness effects of different mutations 
are multiplied. In log-scale, this corresponds to summing the fitness effect of each mutation; 
for this reason this models is called “additive”. Here, the fitness effects of different mutations 
are randomly drawn from a Gaussian distribution with mean and variance cr^. As there 
an independent contribution of each locus, the dimension of interaction is 1 (since each locus 
“interacts” only with itself). 

In terms of Fourier decomposition, in this landscape all coefficients of second order and higher 
are zero. 


A.2.2 The House-of-Cards (HoC) model 


This is a model for random, uncorrelated fitness landscapes (Kingman, 19781. The fitness of 
each genotype is independent on the fitnesses of other genotypes. Here, it is randomly drawn 
according to a Gaussian of mean 0 and variance (t^hoC- models corresponds to full 

interaction between the loci, the dimension of interaction is L. 

In terms of Fourier decomposition, the coefficients are random variables with a marginal 
Gaussian distribution centered in 0. 


A.2.3 The Rough Mount Fuji (RMF) model 

This model interpolates between additive and uncorrelated fitness fandscapes by adding the two 


(Aita et al. 20001. The fitness is computed as the sum of an additive contribution and a HoG 
contribution. Here, the model is tuned by three parameters: mean and variance for the 
additive part and variance for the HoG part. (In the literature, this model is often defined 
with constant additive fitness effects, i.e. cr^ = 0). The model converges to an additive model 
when Ma + to a HoG model when ^ The dimension of interactions 

is a mixture of dimension 1 and dimension L. 

The Fourier decomposition is a linear function of the landscape, so it is a combination of the 
additive and the HoG decompositions. 


A.2.4 The NK model 

This landscape model with N = L loci interpolates between additive and uncorrelated fitness 
landscapes by combining uncorrelated fitness contributions {i.e. small HoGs) from L groups of 


K + 1 loci in an additive way (Kauffman and Weinberger 1989). There are different ways to 
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choose the groups of interacting loci and while several properties such as mean number and mean 


height of local optima depend only weakly on the particular choice made (Weinberger, 1991), 


others seem to behave quantitatively different for some interaction choices (Schmiegelt and Krug 


2014). Nonetheless it has been shown that the fitness correlation function is strictly independent 


of the interaction choice (Campos et al. 2002) and consequently 7 does not depend on it either, 


while the number of chains may still be influenced by it. The number of interacting loci is K + 1 
and the interpolation is controlled by the parameter KT £ {0,1... L — 1}: K — Q corresponds to 
an additive model with independent contributions from each locus, while K = L — 1 corresponds 
to an HoC model. The dimension of interaction is KT + 1. 

A.2.5 The Ising and the IMF models 


This model originates from statistical physics (Mezard et al. 1987), but has an immediate inter¬ 


pretation in terms of pairwise allele incompatibilities. In this model, each pair of interacting loci 
with different alleles causes a reduction in fitness. Here, loci interact only if they are neighbors 
in the genotype sequence (locus i interacts only with locus i — 1 and i + 1 ) and the first and the 
last locus have a single interaction (loci are arranged on a string). The htness cost for each pair 
is drawn from a Gaussian with mean and variance cr^. More general models based on allelic 
incompatibilities correspond to the Sherrington-Kirkpatrick model and other spin glass models 


in statistical physics (Mezard et al. 1987). The dimension of interaction is 2 as interactions 


only occurs between pairs. We also combined incompatibility interactions (Ising model) with an 
independent fitness contribution (additive model) in an “Ising Mount Fuji” (IMF) model in the 
same way the RMF is set. 

In terms of Fourier decomposition, in this landscape all coefficients of third order and higher 


are zero. 


A.2.6 The Eggbox and the EMF models 

This model represents the extreme example of reciprocal sign epistasis of highest dimension. In 
this model, all genotypes in the landscapes have either low or high fitness. All the neighbours 
of a high-fitness genotype have low htness, and vice versa. Therefore, in this landscape, each 
mutation is either deleterious (from high to low htness) or compensatory (from low to high 
htness). Fitnesses are given by a Gaussian with mean /g ± and a small variance a\. The 

dimension of interactions in this landscape is L. We also combine the eggbox interactions with 
independant contributions in an “Eggbox Mount Fuji” (EMF) landscape that is built like the 
RMF or the IMF. 

In terms of Fourier decomposition, this landscape is dominated by the contribution of the 
Lth-order term (that is, the term of highest order). 
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A.3 Formulae for 7 ^ in model landscapes 

We assume an observed fitness f{g) given by the model 


L 

fid) = + feig) + £g 

2=1 


( 19 ) 


that is, an additive contribution an epistatic contribution fe{g) from some fitness 

model, plus the effect of measurement errors Sg. We assume these errors to be unbiased and 
uncorrelated: E[eg] = 0, Cov[£:g,eg/] = 6gg>ag where Sggi is the Kronecker delta, i.e. 6ggi = 1 
when g = g' and 0 otherwise. 

We define the mean squared additive effect g?' = and the mean squared experi¬ 

mental error al = Yl,g 

The expected value of 7 ^ can be computed approximately by taking the ratio of the expected 
values of numerator and denominator of eq. ( 10 ) rearranged as eq. instead of the expected 
value of the ratio (the ~ sign in all our formulae refers to this approximation). The result is 

E[(se(ff) - Se(gd))2 


E[7d] 1 - 


■ Aai 


2^2 + 2E[(s,(5))2]+4a| 


( 20 ) 


with Se{g) = feig[j]) — feig) being the analogue of s{g) restricted to the epistatic contribution. 
Note that E[(se( 5 ) ~ Seigd))^] = 2(1 — E[ 7 e^])E[(se( 5 ))^] if led is the jd statistic of the epistatic 
contribution feig)- 


A.3.1 Additive model 


In these models fe 


0 and the only reduction in correlation is due to experimental noise: 

20-2 


2 a? 


( 21 ) 


A.3.2 RMF and HoC models 


In these models, feig) corresponds to the HoC model, i.e. they are i.i.d. random variables. 
Denote by the variance of the distribution of fitnesses in the HoC model: 


E[7d] 1 - 


+ 2ag 

+ ‘^cthoC + 


( 22 ) 


Since g? = g\-\- a? for a Gaussian distribution of additive fitness effects, we obtain equation ([^ 
for d = 1 and a^ = 0 . 


A.3.3 NK models 

In these models, feig) = Pi where the FiS are i.i.d. random variables that depend on i and 

other K indices, randomly chosen. The FiS have mean /o and variance cr'jgx- 
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The fitness correlation function pd is known exactly for the pure NK model ([Campos et al. 


The variance E[se(g)^] is given by 2{K + because on average if + 1 of the Fi change in 

a single mutation, each of these differences having twice the variance of the fitness contribution 
(since the variance of the difference of two i.i.d. variables is twice their variance). The final 
result is 


20021 


_ {L-K-l)\{L-d)\ 
Pd — L\(L-K-d-l)\ ■ 


With eq. ([ll|) it is straightforward to compute £[ 76 ^;] from this. 


2{K + l)al^[l- 


E[7d] 1 - 






+ 2{K + + 2(t| 


(23) 


Substituting p? = a"l = Q and d=l yields equation (|^ . 


A.3.4 Ising model 


In the following, we define Si = 2Ai — 1 S {—1, +!}• Using the above notations, in these models, 
fe{9) = where the incompatibility coefficients Ji are randomly extracted from a 

Gaussian distribution with mean pc and variance cr^. We define = E[J?] = pi + 

A mutation at locus i will invert contributions of the terms containing Ji and Ji-i adding 
8J^ to E[(se( 5 ))^]. At the edge of the genome, loci interacts with only one neighbor, so this is 
reduced by a factor of 2, thus E[(se( 5 ))^] = ■ 

Only mutations at j — 1 and j +1 affect the value of Sj{g) — sj {ga) ■ Choosing d mutations out 


of L — 1 and applying the hypergeometrical distribution, there are probabilities 


L-3'i 

d-2) 


and 




i^d^) (^-^) 

to choose both or exactly one of them, respectively. Each relevant mutation changes the effect 
of mutating j by ±4J. Thus (ignoring boundaries) E[(se(g) — Seigd))'^] ~ 16J^ = 

i2J^j^. On the boundary only one mutation can influence Sj, resulting in a reduced contribu¬ 
tion of and so together: 


E[(se(5)-Se(5d))"] =32J' 


d L-1 
L-l^L 


= 32 


and therefore the result is 


E[7d] 1 - 


16dJ^/L -I- 2a1 


p^ + 8{L-l)J‘^/L + 2al 


(24) 


A.3.5 Eggbox model 

In this model, fe{g) = /o + i-® each mutation switches the fitness from the 

highest value (/o -|- Pe!"^) to the lowest (/o — PeI‘^) or the other way. The difference in the 
epistatic fitness effects from two genotypes separated by an odd number of mutations is ±/iE, 
while it is 0 from two genotypes separated by an even number of mutations, therefore 


E[7d] 1 - 


(1 - (-1)"^) +2gE 
p^ + p% + 2cr| 


(25) 
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A.4 Formulae for chains in RMF 


We consider a RMF model with equal additive fitness contribution s for each mutation, in addi¬ 
tion to an HoC model with distribution p{f) and cumulative distribution C{f) = J^^p{x)dx. 

Sorting the genotypes in order of their additive fitness, there are (^) genotypes at the /cth 
level with L — k mutations with positive additive fitness effect and k with negative one. The 
average number of chain steps can be obtained as the sum over all possible steps of the probability 
of being a chain step. It is 

k j dxp{x){l — C{x + s))C{x + s)^~^C{x — s)^~^+ 

+ {L — k) J dxp{x){l — C{x — s))C{x — s)^~'^~^C{x + = 

=L J dxp{x)[2 — C{x — s) — C{x + s)] • [C{x -I- s) -I- C{x — (26) 

The number of origins and of chain trees can be obtained in a similar way using the Cayley 
tree/Bethe lattice approximation. In this framework, we approximate locally the hypercube by a 
Cayley tree, i.e. a tree with L branches at each node. This means that we neglect the overlapping 
between the next-to-nearest neighbours of the genotype considered and we assume them to be 
(L — 1)^ independent genotypes instead of L{L — l)/2. 

The probability that a genotype is the origin of a chain is product of the probability of having 
out-degree 1 and that all the other neighbours of lower fitness have out-degree different than 1: 


#steps = ^ 


L 


^origins = | (L - fc) J dxp{x){l - C{x - s)) ■ {C{x - s) - p+{x,s,k))^ ^{C{x + s) - p-{x,s,k))'"+ 

+k J dxp{x){l — C{x + s)) ■ {C{x -b s) — P-{x, s, k))^~^{C{x — s) — _p+(x, s, 


(27) 


where we define the probabilities that a neighbour genotype is the starting or ending point of a 
chain step ending at level k: 

p+{x,s,k)=J dyp{y)C{y + sfC{y - (28) 

p-{x,s,k)=j dyp{y)C{y + s)’^~^C{y - (29) 

The probability that a genotype is the endpoint of a chain is the difference between the 
probability of being the final genotype of a chain step and the probability of being an intermediate 
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point in a chain. The results are 


#endpoints = ^ J dxp{x) (1 — p+{x,s,k))^ ^ • (1 — p_(a;, s, fc))^ 


fe =0 

L 


(30) 


#interniediates^(s) = ^ {L — k) f dxp{x){l — C{x — s)) • 
fc=o ^ 


C(x-s)^-‘^-^C{x + sY+ 


— {C[x — s) — p+{x,s,k))^ ^ • ((^(a: + s) — p_(x, s, A:))* 


(31) 


:j(^:intermediates = ^intermediates^ (s) — #intermediates^(—s) 
#chain trees = #endpoints — ^intermediates 


(32) 

(33) 


These formulae could be further simplified if p{f) is the Gumbel distribution (Neidhart et al. 


20141. 


A.5 Proofs 

A.5.1 Relation between 7 and type of epistasis 

The most extreme values of 7 for different types of epistasis are obtained in the case L = 2. 
Denote by foo, fio, the log-fitness values. The function 7 is defined as 

_ 2 [(/ii — /io)(/oi — foo) + (/ii — /oi)(/io — foo)] 

^ (/ll ~ fwY + (/oi — /oo)^ + (/ll — foiY + (/lO — /oo)^ 

Since 7 is a correlation, — 1 < 7 < 1 . 7 is a continuous function and it is also invariant 

under permutations of loci and alleles, so from now on we will restrict to the subspace with 
/oo < /oi < /lo- Each of the three partitions of this subspace corresponding to magnitude, sign 
and reciprocal sign epistasis is connected, therefore the image of each one of them under 7 is an 
interval. 

By definition, magnitude epistasis results in 7 > 0 since all fitness jumps have the same sign. 
The extreme values 0 and 1 are both realized: 7 = 0 in landscapes with /oo = /oi = /lo < fii, 
while 7 = 1 in landscapes with fn = fiQ + /oi — /oo- Therefore, for magnitude epistasis, 
0 < 7 < 1 . 

Reciprocal sign epistasis require that all fitness jumps change in sign after a mutation, there¬ 
fore results in 7 < 0 by definition. The extreme case 7 = — 1 is realized in the landscape 
/oi = /lo > /oo = /ll) while the other extreme case 7 —0 is realized for the landscapes with 
/oo < /oi = fio > /ll for (/oo - foi) 0. Therefore, for reciprocal sign epistasis, -1 < 7 < 0. 

Finally, sign epistasis can have both signs of 7 . It is easy - although tedious - to show 
that there are no critical points of 7 inside the space of landscapes with sign epistasis (with 
L = 2), therefore the extremal values should appear on the border. There are essentially two 
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borders: (/n — /lo) —>■ 0 and /n = /oi- On the first border, we have 7 > 0 and the upper 
limit is 7 —>■ 1 for the landscapes with foo = foi < fio- On the second border, we have 
7 = — (/^^_/^„)(/fo -foo)^+(fii-foo)^ reaches the minimum value 7 = —1/3 at the landscapes 

with /lo — /ii = /ii — foo (imposing the derivative with respect to /lo to be null). Therefore, 
for sign epistasis, —1/3 < 7 < 1. 


A.5.2 Relation ( |11[ ) between jd and the fitness correlation function 

Denote the fitness correlation function at distance dhy pd = Cor[/(g), f{gd)]- We use the identity 

~ fi9[j])f{9[jiii2---id]) ~ f (9)f {9[jiii2-- id\) 
-fig[j])fi9^^^^2■■■id]) + fi9)f{9[^^i2■■■^d]) 

Averaging over the all mutations and all genotypes, then dividing by Var[/(g)], the above terms 
give rise to: 

E[sj(g)sj( 5 [,,i,..,i^])] = 2{E[f{g)f{gd)]-E[f{g)f{gd+i)]) 

= 2(Cov[/(g), f{gd)] - Cov[f{g), f{gd+i)]) 

= 2{pd- pd+i)y^r[f{g)] 


Summing over genotypes and mutations, the numerator of (10) becomes ^) 2 (/ 5 d—Pd+i)Var[/(g)] 


The denominator of ( |10[ ) can be computed in a similar way by choosing d = 0, obtaining 
{^~^)2^L ■ 2(po - / 9 i)Var[/(g)]. 7 ^ is their ratio 

Pd — Pd+l 


Id = 


Po - Pi 


and since po = CoT[f{g), f{g)] = 1, we obtain the result (11). 


A.5.3 Relation (15| between 7* and the fractions of square motifs 


We consider a landscape without ties {i.e. all genotypes have different fitness values). In this 
case, {s*y = 1 and therefore 7* = jrxpyri) Eg (s) ’ = ^s*{g) ■ s%gi)] where 

the average is over all genotypes and pairs of mutations, or equivalently over all square motifs 
and over their sides. The average over the two sides of a motif is E[s*(( 7 ) • s*(( 7 i)] = (1 + l)/2 = 1 
with magnitude epistasis, (1 — l )/2 = 0 with sign epistasis and (—1 — l )/2 = —1 with reciprocal 
sign epistasis. To obtain the global average, we multiply these results by the fraction of motifs 
of each kind, i.e. 7* = 1 • (fm + 0 • i/s — 1 • (prs- Since they sum to 1, we have (fm = ^ — 4>s — (fra 


and substituting it we obtain the relation (15). 


A.5.4 Number of chain steps in the HoC model 

The average number of chain steps is given by the number of mutations in the landscape, L • 2^, 
multiplied by the probability that the mutation is a chain step. This is the probability that among 
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the initial genotype and all its neighbours, the final one is the most fit and and the initial one is 
second in the fitness rank. Since fitness values are random and uncorrelated, the probability that 
a value is maximum among L +1 is 1 /(L + 1 ) and the conditional probability that another value is 
second is 1/L, therefore we have that the average number of steps is L-2^/L{L + 1) = 2^/(L + l). 


A.5.5 Relation ( |16[ ) between 7 and the Fourier spectrum 

We define n{J,d) = [min((J — 2)/2, {d — l)/2)\. Since the Fourier basis is orthonormal, each 
component of the spectrum gives an independent contribution to the numerator and denomina¬ 
tor of 7 . The contribution of each Bj to the denominator of (|^ is 4J since there are J nonzero 
mutations with fitness effect ±2aij...ij each. For the numerator, there is again a factor J con¬ 
tributions (from nonzero mutations) multiplied by the square of the fitness effect of each term 
(averaged over the choice of the other d mutations). The fitness effect is ±40^^,..^^ if an odd 
number of the d mutations lie within the indices ii.. Aj, and 0 otherwise. The probability that 
this number is odd is given by the sum of odd terms of the hypergeometric distribution with 
parameters d,J— 1, T — 1, therefore we have 




E[7d] 1 - ■ 


(241)(/-21-2) 


(40 


EI2 JBj 


and since Wj = we have the equation (16). 


(34) 
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a) Additive (nA=1,aA=0.1) b) House of Cards (aH=1) 



e) Ising Model - ac=0.1) 



f) Eggbox ([Xg=1, a£=0.1) 



Genotype 



Peak 


gain 

loss 
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Figure 8: Models of fitness landscapes. Realizations of random landscapes obtained from the models 


discussed in the introduction, using Magellan (Brouillet et al. 


submitted I. 








